52 research outputs found
Debugging Inputs
When a program fails to process an input, it need not be the program code that is at fault. It can also be that the input data is faulty, for instance as result of data corruption. To get the data processed, one then has to debug the input data—that is, (1) identify which parts of the input data prevent processing, and (2) recover as much of the (valuable) input data as possible. In this paper, we present a general-purpose algorithm called ddmax that addresses these problems automatically. Through experiments, ddmax maximizes the subset of the input that can still be processed by the program, thus recovering and repairing as much data as possible; the difference between the original failing input and the “maximized” passing input includes all input fragments that could not be processed. To the best of our knowledge, ddmax is the first approach that fixes faults in the input data without requiring program analysis. In our evaluation, ddmax repaired about 69% of input files and recovered about 78% of data within one minute per input
WiSer: A Highly Available HTAP DBMS for IoT Applications
In a classic transactional distributed database management system (DBMS),
write transactions invariably synchronize with a coordinator before final
commitment. While enforcing serializability, this model has long been
criticized for not satisfying the applications' availability requirements. When
entering the era of Internet of Things (IoT), this problem has become more
severe, as an increasing number of applications call for the capability of
hybrid transactional and analytical processing (HTAP), where aggregation
constraints need to be enforced as part of transactions. Current systems work
around this by creating escrows, allowing occasional overshoots of constraints,
which are handled via compensating application logic.
The WiSer DBMS targets consistency with availability, by splitting the
database commit into two steps. First, a PROMISE step that corresponds to what
humans are used to as commitment, and runs without talking to a coordinator.
Second, a SERIALIZE step, that fixes transactions' positions in the
serializable order, via a consensus procedure. We achieve this split via a
novel data representation that embeds read-sets into transaction deltas, and
serialization sequence numbers into table rows. WiSer does no sharding (all
nodes can run transactions that modify the entire database), and yet enforces
aggregation constraints. Both readwrite conflicts and aggregation constraint
violations are resolved lazily in the serialized data. WiSer also covers node
joins and departures as database tables, thus simplifying correctness and
failure handling. We present the design of WiSer as well as experiments
suggesting this approach has promise
Business Analytics in (a) Blink
The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds,
regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS
aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core
processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally,
it exploits proprietary compression technology and cache-conscious algorithms that reduce memory
bandwidth consumption and allow most SQL query processing to be performed on the compressed data.
Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or
materialized views, and without any query optimizer to choose among them. The Blink technology has
thus far been incorp
- …